Skip to content

[RISCV] Switch to sign-extended loads if possible in RISCVOptWInstrs #144703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

asb
Copy link
Contributor

@asb asb commented Jun 18, 2025

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

  • LW is compressible while LWU is not.
  • Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
  • Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

  • Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
  • RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
  • Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.
  • As I've put the specific home of this change as an open question, I haven't added MIR tests for RISCVOptWInstrs. I'll add that if we decide this is the right home.

The first version of the patch that included LBU->LB conversion changed 95k instructions across a compile of llvm-test-suite (including SPEC 2017). As was pointed out, Zcb has C_LH and C_LHU but only C_LBU (and there's also no C_LB in the baseline C). Therefore, we avoid converting C_LBU to C_LB as it i less compressible. This leaves us with ~36k instructions changed across a compile of llvm-test-suite.

This is currently implemented as part of RISCVOptWInstrs in order to
reuse hasAllNbitUsers. However a new home or further refactoring will be
needed (see the end of this note).

Why prefer sign-extended loads?
* Sign-extending loads are more compressible. There is no compressed
  LWU, and LBU and LHU are only available in Zcb.
* Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
* Helps to minimise distracting diffs vs GCC. I see this come up
  frequently when comparing GCC code and in these cases it's a red
  herring.

Issues or open questions with this patch as stands:
* Doing something at the MI level makes sense as a last resort. I wonder
  if for some of the cases we could be producing a sign-extended load
  earlier on.
* RISCVOptWInstrs is a slightly awkward home. It's currently not run for
  RV32, which means the LBU/LHU changes can actually add new diffs vs
  RV32. Potentially just this one transformation could be done for RV32.
* Do we want to perform additional load narrowing? With the existing
  code, an LD will be narrowed to LW if only the lower bits are needed.
  The example that made look at extending load normalisation in the
  first place was an LWU that immediately has a small mask applied, which
  could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite
(including SPEC 2017), and all tests complete successfully afterwards.
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Alex Bradbury (asb)

Changes

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

  • Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb.
  • Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
  • Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

  • Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
  • RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
  • Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.


Patch is 468.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144703.diff

58 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+45)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll (+5-11)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll (+5-11)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+96-96)
  • (modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+30-15)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/bfloat.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/double-convert-strict.ll (+6-12)
  • (modified) llvm/test/CodeGen/RISCV/double-convert.ll (+6-12)
  • (modified) llvm/test/CodeGen/RISCV/float-convert-strict.ll (+10-22)
  • (modified) llvm/test/CodeGen/RISCV/float-convert.ll (+10-22)
  • (modified) llvm/test/CodeGen/RISCV/fold-mem-offset.ll (+132-65)
  • (modified) llvm/test/CodeGen/RISCV/half-arith.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/half-convert-strict.ll (+15-27)
  • (modified) llvm/test/CodeGen/RISCV/half-convert.ll (+21-39)
  • (modified) llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll (+15-7)
  • (modified) llvm/test/CodeGen/RISCV/local-stack-slot-allocation.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/mem64.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/memcpy-inline.ll (+94-94)
  • (modified) llvm/test/CodeGen/RISCV/memcpy.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/memmove.ll (+35-35)
  • (modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+160-160)
  • (modified) llvm/test/CodeGen/RISCV/prefer-w-inst.mir (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+512-512)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+33-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+97-97)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-vrgather.ll (+32-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-load-store.ll (-23)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+87-87)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+17-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+2-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+31-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+31-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+35-17)
  • (modified) llvm/test/CodeGen/RISCV/rvv/memcpy-inline.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+15-6)
  • (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/stack-clash-prologue.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/unaligned-load-store.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/urem-seteq-illegal-types.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-by-byte-multiple-legalization.ll (+132-132)
  • (modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-legalization.ll (+72-72)
  • (modified) llvm/test/CodeGen/RISCV/zdinx-boundary-check.ll (+3-3)
diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index ed61236415ccf..a141bae55b70f 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -71,6 +71,8 @@ class RISCVOptWInstrs : public MachineFunctionPass {
                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
   bool appendWSuffixes(MachineFunction &MF, const RISCVInstrInfo &TII,
                        const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
+  bool convertZExtLoads(MachineFunction &MF, const RISCVInstrInfo &TII,
+                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesCFG();
@@ -788,6 +790,47 @@ bool RISCVOptWInstrs::appendWSuffixes(MachineFunction &MF,
   return MadeChange;
 }
 
+bool RISCVOptWInstrs::convertZExtLoads(MachineFunction &MF,
+                                      const RISCVInstrInfo &TII,
+                                      const RISCVSubtarget &ST,
+                                      MachineRegisterInfo &MRI) {
+  bool MadeChange = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      unsigned WOpc;
+      int UsersWidth;
+      switch (MI.getOpcode()) {
+      default:
+        continue;
+      case RISCV::LBU:
+        WOpc = RISCV::LB;
+        UsersWidth = 8;
+        break;
+      case RISCV::LHU:
+        WOpc = RISCV::LH;
+        UsersWidth = 16;
+        break;
+      case RISCV::LWU:
+        WOpc = RISCV::LW;
+        UsersWidth = 32;
+        break;
+      }
+
+      if (hasAllNBitUsers(MI, ST, MRI, UsersWidth)) {
+        LLVM_DEBUG(dbgs() << "Replacing " << MI);
+        MI.setDesc(TII.get(WOpc));
+        MI.clearFlag(MachineInstr::MIFlag::NoSWrap);
+        MI.clearFlag(MachineInstr::MIFlag::NoUWrap);
+        MI.clearFlag(MachineInstr::MIFlag::IsExact);
+        LLVM_DEBUG(dbgs() << "     with " << MI);
+        MadeChange = true;
+      }
+    }
+  }
+
+  return MadeChange;
+}
+
 bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -808,5 +851,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (ST.preferWInst())
     MadeChange |= appendWSuffixes(MF, TII, ST, MRI);
 
+  MadeChange |= convertZExtLoads(MF, TII, ST, MRI);
+
   return MadeChange;
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
index a49e94f4bc910..620c5ecc6c1e7 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
@@ -246,17 +246,11 @@ define double @fcvt_d_wu(i32 %a) nounwind {
 }
 
 define double @fcvt_d_wu_load(ptr %p) nounwind {
-; RV32IFD-LABEL: fcvt_d_wu_load:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    lw a0, 0(a0)
-; RV32IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fcvt_d_wu_load:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    lwu a0, 0(a0)
-; RV64IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fcvt_d_wu_load:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    lw a0, 0(a0)
+; CHECKIFD-NEXT:    fcvt.d.wu fa0, a0
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_d_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
index fa093623dd6f8..bbea7929a304e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
@@ -232,17 +232,11 @@ define float @fcvt_s_wu(i32 %a) nounwind {
 }
 
 define float @fcvt_s_wu_load(ptr %p) nounwind {
-; RV32IF-LABEL: fcvt_s_wu_load:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    lw a0, 0(a0)
-; RV32IF-NEXT:    fcvt.s.wu fa0, a0
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fcvt_s_wu_load:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    lwu a0, 0(a0)
-; RV64IF-NEXT:    fcvt.s.wu fa0, a0
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fcvt_s_wu_load:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lw a0, 0(a0)
+; CHECKIF-NEXT:    fcvt.s.wu fa0, a0
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_s_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
index 9690302552090..65838f51fc920 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
@@ -748,7 +748,7 @@ define signext i32 @ctpop_i32_load(ptr %p) nounwind {
 ;
 ; RV64ZBB-LABEL: ctpop_i32_load:
 ; RV64ZBB:       # %bb.0:
-; RV64ZBB-NEXT:    lwu a0, 0(a0)
+; RV64ZBB-NEXT:    lw a0, 0(a0)
 ; RV64ZBB-NEXT:    cpopw a0, a0
 ; RV64ZBB-NEXT:    ret
   %a = load i32, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
index cd59c9e01806d..ba058ca0b500a 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
@@ -114,7 +114,7 @@ define i64 @pack_i64_2(i32 signext %a, i32 signext %b) nounwind {
 define i64 @pack_i64_3(ptr %0, ptr %1) {
 ; RV64I-LABEL: pack_i64_3:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    lwu a0, 0(a0)
+; RV64I-NEXT:    lw a0, 0(a0)
 ; RV64I-NEXT:    lwu a1, 0(a1)
 ; RV64I-NEXT:    slli a0, a0, 32
 ; RV64I-NEXT:    or a0, a0, a1
@@ -122,8 +122,8 @@ define i64 @pack_i64_3(ptr %0, ptr %1) {
 ;
 ; RV64ZBKB-LABEL: pack_i64_3:
 ; RV64ZBKB:       # %bb.0:
-; RV64ZBKB-NEXT:    lwu a0, 0(a0)
-; RV64ZBKB-NEXT:    lwu a1, 0(a1)
+; RV64ZBKB-NEXT:    lw a0, 0(a0)
+; RV64ZBKB-NEXT:    lw a1, 0(a1)
 ; RV64ZBKB-NEXT:    pack a0, a1, a0
 ; RV64ZBKB-NEXT:    ret
   %3 = load i32, ptr %0, align 4
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
index 69519c00f88ea..27c6d0240f987 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
@@ -8,13 +8,13 @@ define void @lshr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -85,13 +85,13 @@ define void @shl_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -162,13 +162,13 @@ define void @ashr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -244,25 +244,25 @@ define void @lshr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -395,25 +395,25 @@ define void @shl_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -541,25 +541,25 @@ define void @ashr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -695,7 +695,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -707,7 +707,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -729,7 +729,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1028,7 +1028,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1040,7 +1040,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1062,7 +1062,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1361,7 +1361,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1373,7 +1373,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1395,7 +1395,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1690,7 +1690,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1702,7 +1702,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1724,7 +1724,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2020,7 +2020,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2032,7 +2032,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2054,7 +2054,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2353,7 +2353,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2365,7 +2365,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2387,7 +2387,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2697,7 +2697,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2705,7 +2705,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    lbu s5, 17(a0)
 ; RV64I-NEXT:    lbu s6, 18(a0)
@@ -2719,7 +2719,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s8, 20(a0)
 ; RV64I-NEXT:    lbu s9, 21(a0)
 ; RV64I-NEXT:    lbu s10, 22(a0)
-; RV64I-NEXT:    lbu s11, 23(a0)
+; RV64I-NEXT:    lb s11, 23(a0)
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2741,7 +2741,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s2, 28(a0)
 ; RV64I-NEXT:    lbu s3, 29(a0)
 ; RV64I-NEXT:    lbu s4, 30(a0)
-; RV64I-NEXT:    lbu a0, 31(a0)
+; RV64I-NEXT:    lb a0, 31(a0)
 ; RV64I-NEXT:    slli s9, s9, 8
 ; RV64I-NEXT:    slli s11, s11, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2763,7 +2763,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a0, 4(a1)
 ; RV64I-NEXT:    lbu s1, 5(a1)
 ; RV64I-NEXT:    lbu s4, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli s8, s8, 8
 ; RV64I-NEXT:    or s7, s8, s7
 ; RV64I-NEXT:    slli s1, s1, 8
@@ -3621,7 +3621,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -3629,7 +3629,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-llvm-globalisel

Author: Alex Bradbury (asb)

Changes

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

  • Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb.
  • Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
  • Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

  • Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
  • RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
  • Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.


Patch is 468.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144703.diff

58 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+45)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll (+5-11)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll (+5-11)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+96-96)
  • (modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+30-15)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/bfloat.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/double-convert-strict.ll (+6-12)
  • (modified) llvm/test/CodeGen/RISCV/double-convert.ll (+6-12)
  • (modified) llvm/test/CodeGen/RISCV/float-convert-strict.ll (+10-22)
  • (modified) llvm/test/CodeGen/RISCV/float-convert.ll (+10-22)
  • (modified) llvm/test/CodeGen/RISCV/fold-mem-offset.ll (+132-65)
  • (modified) llvm/test/CodeGen/RISCV/half-arith.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/half-convert-strict.ll (+15-27)
  • (modified) llvm/test/CodeGen/RISCV/half-convert.ll (+21-39)
  • (modified) llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll (+15-7)
  • (modified) llvm/test/CodeGen/RISCV/local-stack-slot-allocation.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/mem64.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/memcpy-inline.ll (+94-94)
  • (modified) llvm/test/CodeGen/RISCV/memcpy.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/memmove.ll (+35-35)
  • (modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+160-160)
  • (modified) llvm/test/CodeGen/RISCV/prefer-w-inst.mir (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+512-512)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+33-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+97-97)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-vrgather.ll (+32-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-load-store.ll (-23)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+87-87)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+17-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+2-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+31-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+31-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+35-17)
  • (modified) llvm/test/CodeGen/RISCV/rvv/memcpy-inline.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+15-6)
  • (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/stack-clash-prologue.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/unaligned-load-store.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/urem-seteq-illegal-types.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-by-byte-multiple-legalization.ll (+132-132)
  • (modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-legalization.ll (+72-72)
  • (modified) llvm/test/CodeGen/RISCV/zdinx-boundary-check.ll (+3-3)
diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index ed61236415ccf..a141bae55b70f 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -71,6 +71,8 @@ class RISCVOptWInstrs : public MachineFunctionPass {
                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
   bool appendWSuffixes(MachineFunction &MF, const RISCVInstrInfo &TII,
                        const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
+  bool convertZExtLoads(MachineFunction &MF, const RISCVInstrInfo &TII,
+                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesCFG();
@@ -788,6 +790,47 @@ bool RISCVOptWInstrs::appendWSuffixes(MachineFunction &MF,
   return MadeChange;
 }
 
+bool RISCVOptWInstrs::convertZExtLoads(MachineFunction &MF,
+                                      const RISCVInstrInfo &TII,
+                                      const RISCVSubtarget &ST,
+                                      MachineRegisterInfo &MRI) {
+  bool MadeChange = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      unsigned WOpc;
+      int UsersWidth;
+      switch (MI.getOpcode()) {
+      default:
+        continue;
+      case RISCV::LBU:
+        WOpc = RISCV::LB;
+        UsersWidth = 8;
+        break;
+      case RISCV::LHU:
+        WOpc = RISCV::LH;
+        UsersWidth = 16;
+        break;
+      case RISCV::LWU:
+        WOpc = RISCV::LW;
+        UsersWidth = 32;
+        break;
+      }
+
+      if (hasAllNBitUsers(MI, ST, MRI, UsersWidth)) {
+        LLVM_DEBUG(dbgs() << "Replacing " << MI);
+        MI.setDesc(TII.get(WOpc));
+        MI.clearFlag(MachineInstr::MIFlag::NoSWrap);
+        MI.clearFlag(MachineInstr::MIFlag::NoUWrap);
+        MI.clearFlag(MachineInstr::MIFlag::IsExact);
+        LLVM_DEBUG(dbgs() << "     with " << MI);
+        MadeChange = true;
+      }
+    }
+  }
+
+  return MadeChange;
+}
+
 bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -808,5 +851,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (ST.preferWInst())
     MadeChange |= appendWSuffixes(MF, TII, ST, MRI);
 
+  MadeChange |= convertZExtLoads(MF, TII, ST, MRI);
+
   return MadeChange;
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
index a49e94f4bc910..620c5ecc6c1e7 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
@@ -246,17 +246,11 @@ define double @fcvt_d_wu(i32 %a) nounwind {
 }
 
 define double @fcvt_d_wu_load(ptr %p) nounwind {
-; RV32IFD-LABEL: fcvt_d_wu_load:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    lw a0, 0(a0)
-; RV32IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fcvt_d_wu_load:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    lwu a0, 0(a0)
-; RV64IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fcvt_d_wu_load:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    lw a0, 0(a0)
+; CHECKIFD-NEXT:    fcvt.d.wu fa0, a0
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_d_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
index fa093623dd6f8..bbea7929a304e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
@@ -232,17 +232,11 @@ define float @fcvt_s_wu(i32 %a) nounwind {
 }
 
 define float @fcvt_s_wu_load(ptr %p) nounwind {
-; RV32IF-LABEL: fcvt_s_wu_load:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    lw a0, 0(a0)
-; RV32IF-NEXT:    fcvt.s.wu fa0, a0
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fcvt_s_wu_load:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    lwu a0, 0(a0)
-; RV64IF-NEXT:    fcvt.s.wu fa0, a0
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fcvt_s_wu_load:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lw a0, 0(a0)
+; CHECKIF-NEXT:    fcvt.s.wu fa0, a0
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_s_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
index 9690302552090..65838f51fc920 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
@@ -748,7 +748,7 @@ define signext i32 @ctpop_i32_load(ptr %p) nounwind {
 ;
 ; RV64ZBB-LABEL: ctpop_i32_load:
 ; RV64ZBB:       # %bb.0:
-; RV64ZBB-NEXT:    lwu a0, 0(a0)
+; RV64ZBB-NEXT:    lw a0, 0(a0)
 ; RV64ZBB-NEXT:    cpopw a0, a0
 ; RV64ZBB-NEXT:    ret
   %a = load i32, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
index cd59c9e01806d..ba058ca0b500a 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
@@ -114,7 +114,7 @@ define i64 @pack_i64_2(i32 signext %a, i32 signext %b) nounwind {
 define i64 @pack_i64_3(ptr %0, ptr %1) {
 ; RV64I-LABEL: pack_i64_3:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    lwu a0, 0(a0)
+; RV64I-NEXT:    lw a0, 0(a0)
 ; RV64I-NEXT:    lwu a1, 0(a1)
 ; RV64I-NEXT:    slli a0, a0, 32
 ; RV64I-NEXT:    or a0, a0, a1
@@ -122,8 +122,8 @@ define i64 @pack_i64_3(ptr %0, ptr %1) {
 ;
 ; RV64ZBKB-LABEL: pack_i64_3:
 ; RV64ZBKB:       # %bb.0:
-; RV64ZBKB-NEXT:    lwu a0, 0(a0)
-; RV64ZBKB-NEXT:    lwu a1, 0(a1)
+; RV64ZBKB-NEXT:    lw a0, 0(a0)
+; RV64ZBKB-NEXT:    lw a1, 0(a1)
 ; RV64ZBKB-NEXT:    pack a0, a1, a0
 ; RV64ZBKB-NEXT:    ret
   %3 = load i32, ptr %0, align 4
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
index 69519c00f88ea..27c6d0240f987 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
@@ -8,13 +8,13 @@ define void @lshr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -85,13 +85,13 @@ define void @shl_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -162,13 +162,13 @@ define void @ashr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -244,25 +244,25 @@ define void @lshr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -395,25 +395,25 @@ define void @shl_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -541,25 +541,25 @@ define void @ashr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -695,7 +695,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -707,7 +707,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -729,7 +729,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1028,7 +1028,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1040,7 +1040,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1062,7 +1062,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1361,7 +1361,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1373,7 +1373,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1395,7 +1395,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1690,7 +1690,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1702,7 +1702,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1724,7 +1724,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2020,7 +2020,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2032,7 +2032,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2054,7 +2054,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2353,7 +2353,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2365,7 +2365,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2387,7 +2387,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2697,7 +2697,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2705,7 +2705,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    lbu s5, 17(a0)
 ; RV64I-NEXT:    lbu s6, 18(a0)
@@ -2719,7 +2719,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s8, 20(a0)
 ; RV64I-NEXT:    lbu s9, 21(a0)
 ; RV64I-NEXT:    lbu s10, 22(a0)
-; RV64I-NEXT:    lbu s11, 23(a0)
+; RV64I-NEXT:    lb s11, 23(a0)
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2741,7 +2741,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s2, 28(a0)
 ; RV64I-NEXT:    lbu s3, 29(a0)
 ; RV64I-NEXT:    lbu s4, 30(a0)
-; RV64I-NEXT:    lbu a0, 31(a0)
+; RV64I-NEXT:    lb a0, 31(a0)
 ; RV64I-NEXT:    slli s9, s9, 8
 ; RV64I-NEXT:    slli s11, s11, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2763,7 +2763,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a0, 4(a1)
 ; RV64I-NEXT:    lbu s1, 5(a1)
 ; RV64I-NEXT:    lbu s4, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli s8, s8, 8
 ; RV64I-NEXT:    or s7, s8, s7
 ; RV64I-NEXT:    slli s1, s1, 8
@@ -3621,7 +3621,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -3629,7 +3629,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    ...
[truncated]

Copy link

github-actions bot commented Jun 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@topperc
Copy link
Collaborator

topperc commented Jun 18, 2025

If I remember right, there is no c.lb in Zcb. Does this make compression worse?

@asb
Copy link
Contributor Author

asb commented Jun 18, 2025

If I remember right, there is no c.lb in Zcb. Does this make compression worse?

You're right, I'll update to exclude LB.

EDIT: Now pushed, and patch description updated.

@topperc
Copy link
Collaborator

topperc commented Jun 18, 2025

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp. That isn't restricted to RV64. Though it won't work across basic blocks.

@topperc
Copy link
Collaborator

topperc commented Jun 18, 2025

I think changing LWU->LW is useful for compression and minimizing RV32/RV64 delta.

I'm skeptical that LWU->LW and LHU->LH will help us match gcc better. Here are trivial examples where LLVM used LH/LW and gcc used LHU/LWU. https://godbolt.org/z/cbje4aT7P I guess it could be better on average, but it certainly doesn't guarantee a match to gcc.

@asb
Copy link
Contributor Author

asb commented Jun 19, 2025

I think changing LWU->LW is useful for compression and minimizing RV32/RV64 delta.

Agreed.

I'm skeptical that LWU->LW and LHU->LH will help us match gcc better. Here are trivial examples where LLVM used LH/LW and gcc used LHU/LWU. https://godbolt.org/z/cbje4aT7P I guess it could be better on average, but it certainly doesn't guarantee a match to gcc.

That may be true. I spotted this looking at how we had some LWU that GCC didn't in a workload and wondering if it was an indicator of us overall making some different/worse choices in terms of sign/zero extension (and of course found it was just a case where either was equivalent), and I'm sure I've seen the same before. But I totally believe there are other cases where we make different choices. I'll quantify how often it kicks in, but I probably wouldn't mind dropping LHU->LH. Perhaps the argument for this kind of change is more just "canonicalisation".

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp

I'll try that and report back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants